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LINEAR  PROGRAMMING  TECHNIQUES  FOR  REGRESSION  ANALYSIS 


by 

Harvey  M.  Wagner 
Stanford  University 


1  Introduction 

Karst  [5]  has  recently  suggested  an  iterative  procedure  "for  finding 
a  straight  line  of  best  fit  to  a  set  of  two  dimensional  points  such  that 
the  sum  of  the  absolute  values  of  the  vertical  deviations  of  the  points 
from  the  line  is  a  minimum."  It  is  well  known  that  the  general  p  +  1 
dimensional  version  of  this  problem  may  be  exactly  formulated  as  a  linear 
programming  model  consisting  of  n  equations,  where  n  is  the  number  of 
observations.  By  employing  the  fundamental  dual  theorem  [1,  6,  8]  in  linear 
programming,  we  shall  show  how  the  problem  can  be  solved  by  a  p  equation 
linear  programming  model  with  bounded  variables  [2,  3»  9]  *  Secondly  we 
shall  demonstrate  how  a  regular  p  +  1  equation  linear  programming  model 
can  be  utilized  to  find  a  line  of  best  fit  according  to  a  Chebyshev  criterion 
[4],  i  e.,  a  line  (or  hyperplane)  which  minimizes  the  maximum  of  the  vertical 
deviations  from  the  sample  points . 

2.  Minimizing  the  Sum  of  Absolute  Deviations 
Let  X  denote  an  n  x  p  dimensional  matrix,  where  the  columns  consist 
of  n  observational  measurements  on  p  "independent"  variables,  and  Y  denote 
an  n-dimensional  column  vector  of  measurements  on  the  "dependent"  variable. 
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We  wish  to  find  a  p-dimensionsl  column  vector  b  such  that 


Xb 


+  Ie^  -  leg  =  Y  ,  e1,eg  >  0 


(1) 


minimize  E  =  (1  1  ...  1) 


where  I  is  an  n  x  n  identity  matrix.  We  interpret  and  e2  as 

n-dimensional  column  vectors  of  vertical  deviations  "below"  and  "above" 
the  fitted  linej  i.e.,  (e^  +  e^)  is  the  vector  of  absolute  deviations 
between  the  fit  Xb  and  Y  (by  the  nature  of  the  model,  it  is  clear  that 
the  J-th  components  of  e^  and  e cannot  both  be  strictly  positive 
in  an  optimal  solution).  The  solution  to  our  problem  yields  the 
regression  equation 


(xx  x2  . . .  x  ) 


AA 


w 


=  xb  =  y. 


(2) 


Note  that  if  we  wish  the  left  hand  side  of  (2)  to  include  a 
coefficient  for  the  intercept  of  the  y  axis  to  be  determined  by  the  linear 
fit,  then  we  can  let  x^  =  1  ,  and  the  p-th  column  of  X  be  a  vector  of 
one's  We  may  force  the  fitted  line  to  pass  through  some  point,  the  usual 
example  being  the  set  of  sample  means,  either  by  adding  to  (1)  the  linear 
restriction 


0 


.  x  )  b  =  y 
P 


(3) 
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or  by  the  usual  least  squares  approach  of  subtracting  each  coordinate  of 
the  point,  in  our  example  the  sample  mean  for  each  variable,  from  all  the 
corresponding  observations  (including  y)  and  then  by  fitting  (l)  without 
a  y-intercept  coefficient;  the  latter  approach  simply  consists  of  shifting 
the  origin  of  the  axes  in  a  p-dimensional  space  to  the  selected  point,  and 
then  of  fitting  the  line  (hyperplane)  through  the  new  origin. 

If  it  is  desirable,  the  linear  programming  model  (1)  can  be  restricted 
further  to  permit  only  non-negative  values  for  some  or  all  of  the  components 
of  b,  and  to  force  b  to  satisfy  additional  linear  constraints.  It  is 
noteworthy  that  collinearity  in  X  (even  to  the  degree  that  two  columns  of 
X  are  identical)  will  not  cause  a  failure  in  the  algorithm  for  (1).  One 
drawback  of  the  model  is  evident:  when  the  number  of  observations  n  is 


sizeable,  (1)  becomes  computationally  unwieldy. 

We  shall  now  transform  (1)  into  a  more  manageable  dual  problem  which 
will  yield  the  optimal  b  as  a  byproduct.  To  start,  we  shall  assume  we 
have  added  to  (1)  the  restriction  b  >  0-  The  fundamental  dual  relationship 
in  linear  programming  [1,  6,  8]  asserts  a  solution  to  (1)  can  be  found  by 


considering  the  linear  programming  model 


(4a) 


Id  < 


(kb) 


maximize  G  =  Y'd 


where  X'  is  the  transpose  of  X,  Y*  the  transpose  of  Y,  and  d  an 
n-dimensional  column  vector  of  "dual  variables"  which  are  unrestricted 
in  sign  (because  (l)  consists  of  a  set  of  equations).  Model  (k) ,  as  it 
appears,  is  even  a  larger  problem  than  (1),  since  it  consists  of  p  +  2n 
relations .  To  reduce  the  problem  to  a  model  in  p  equations  and  n  bounded 
variables  we  let 


Then  (4)  is  equivalent  to 


-5- 


Upon  appending  a  set  of  slack  variables  to  (6a)  and  dropping  the  constant 
on  the  right  side  of  (6c),  we  may  solve  (6)  by  one  of  the  simplex 
algorithms  for  bounded  variables  [2,  3,  91  •  If  X  and  Y  are  deviations 
of  sample  values  from  their  means,  then  the  right  hand  side  of  (6a)  is 
a  vector  of  zero's,  and  the  constant  in  (6c)  is  zero.  Denoting  the  basis 
of  the  optimal  solution  of  (6)  by  B  (which  may  include  slack  vectors), 
and  the  associated  coefficients  in  (6c)  by  r'  ,  we  have 

b  =  (B-1)'rB  .  (7) 


No  extra  computations  are  needed  to  find  (7)*  In  the  original  simplex 

method  b  appears  in  the  (z  -c  )  row  of  the  final  simplex  tableau  under 

J  J 

the  columns  for  the  slack  vectors  [1,  8);  in  the  revised  simplex  method 
(7)  is  the  "shadow  price"  vector  for  the  optimal  solution  [7],  The 
optimal  value  of  G*  is  the  minimized  sum  of  absolute  deviations . 

When  we  drop  our  assumption  that  b  be  non-negative  and  allow  the 
components  of  b  to  take  on  any  sign,  we  modify  (6)  to 


and  introduce  a  set  of  artificial  variables  having  an  arbitrarily  high 
cost  to  initiate  one  of  the  simplex  algorithms.  The  optimal  b  remains 


(7),  i.e.,  the  shadow  price  vector  in  the  revised  simplex  method  or 


z 


J 


of  the  final  simplex  tableau  under  the  columns  for  the  artificial  vectors 


[1]. 
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In  summary,  we  can  solve  for  to  in  (1)  toy  applying  a  simplex 
algorithm  for  toounded  varieties  to  the  p  equation  model  (6).  Although 
the  mathematical  manipulation  underlying  the  transformation  of  problems 
appears  involved,  the  computational  procedure  required  to  solve  (6)  is 
relatively  straightforward,  but  somewhat  laborious. 


3.  Minimizing  the  Maxi mum  Absolute  Deviation 
The  most  bothersome  aspect  of  the  approach  in  the  previous  section 
is  the  requirement  of  a  linear  programming  algorithm  for  toounded  variables, 
as  such  techniques  are  (slightly)  more  difficult  to  perform  than  the 
standard  simplex  algorithm.  We  may  eliminate  the  drawback  if  we  are 
willing  to  accept  a  Chebyshev  criterion  for  best  fit.  Our  model  in  this 
case  is  to  find  a  p  dimensional  column  vector  b  such  that 


Xb  -  Y  < 


-Xb  +  Y  < 


fi 

\k 

'l\ 


minimize  e  >  0 


(8a) 


(8b) 


(8c) 


Examination  of  (8)  will  reveal  that  e  is  the  maximum  absolute  deviation 
The  equations  (8)  are  reminiscent  of  a  linear  programming  formulation  for 
the  minimax  problem  in  two-person  zero-sum  games,  and  we  shall  use  a 
similar  approach  for  the  solution.  An  equivalent  expression  for  (8)  is 
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(9) 


minimize  e  >  0  , 

— » 

where  1  is  an  n-dimensional  column  vector  of  one's.  Our  previous 
remarks  concerning  additional  linear  constraints  on  b  apply  here  equally 
as  well. 

Assuming  for  the  moment  that  we  wish  to  impose  the  restriction  b  >  0  , 
we  convert  the  2n  equation  model  (9)  into  its  dual  form,  which  contains  only 
p  +  1  equations 


(10a) 


(10b) 


maximize  M  =  ( -Y ’ 


Y') 


(10c) 


where  0  is  a  p-dimensional  column  vector  of  zero's.  The  vectors  k  and 
hg  are  n-dimensional  columns;  if  a  component  of  h^  (hg)  is  positive  in  the 
optimal  solution  of  (10),  then  the  maximum  deviation  occurs  at  the  corresponding 
point  or  equation  in  (8),  and  this  point  will  lie  "below"  ("above")  the  fitted 
line.  Analogous  to  our  result  in  (7)  , 


, 

-8- 

=  (B-1)'rB  (11) 

where  B  denotes  the  optimal  basis  for  (10),  and  r  '  the  coefficients  S 

D 

in  (10c)  corresponding  to  the  variables  in  Bj  and  exactly  as  before,  the 
solution  (11)  is  a  byproduct  of  the  simplex  method. 

If  we  drop  the  assumption  that  b  be  non-negative,  we  need  only 
change  (10a)  to  equalities,  and  results  analogous  to  those  in  the  previous 
section  continue  to  hold. 

4  A  Numerical  Example 
Karst  [5]  examines  the  following  data 


x'  =  [-12.5  -8.5 

-6.5  -3-5  -2.5  -1.5 

-0.5  2.5  4-5  8.5 

8.5 

11.5] 

(12) 

Y'  =  [-  8.4  -5.4 

3.6  -2.4  -4.4  1.6 

-0.4  -0.4  -2.4  3-6 

5.6 

9-6), 

which  comprise  deviations  of  the  original  data  about  their  sample  means  , 

He  finds  the  least  squares  fit  to  be 

y  -  -539  x  ,  (13) 

and  the  fit  for  the  minimized  sum  of  absolute  deviations  to  be 

y  =  .659  x  .  (14) 

As  we  have  argued,  (14)  can  be  obtained  by  (6),  where  specifically  we 
would  find 

b  =  (b”1) 'rB  =  (1/8.5)  5.6  =  .659  •  (7s) 


b 

e 
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The  solution  by  model  (10)  yields 

1-6.5  11.5  I 


and  consequently 


rB'  =  [3*6  9-6] 


5.767 


(11‘) 


That  is,  the  Chebyshev  fit  is 


•333  x  ; 


since  the  vectors  in  B  correspond  to  variables  in  h 2  ,  the  third  and 
last  saaq)le  point  in  (12)  will  lie  above  the  fitted  line  and  assume  the 
maximum  vertical  deviation  from  it  of  5.767. 
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